Speech Retrieval under Limited Resources and Open Domain Conditions
نویسنده
چکیده
Speech Retrieval focuses on retrieving a segment of speech from a speech corpus correspond to a given query. A standard Speech Retrieval system usually composed by two systems, the Automatic Speech Recognition (ASR) system and the Information Retrieval (IR) system. The ASR system transcribes the speech and represents the transcript in different formats. The transcript is then indexed and searched by the IR system. As a result, Speech Retrieval is sensitive to the ASR, since IR system is depend on the transcript generated by ASR. The current challenge in Speech Retrieval is the limitation of ASR performance under certain conditions. Two such conditions are Limited Resources and Open Domain. Under Limited resources condition, the training data is not sufficient for creating a robust ASR system. A good example for this condition is to perform Speech Retrieval on limited resources languages such as Tagalog or Assamese. On the other hand, under Open Domain condition, the recorded speech varies in many perspectives. The high diversity of recorded speech limits the performance of a single ASR system. A good example for this condition is to perform Speech Retrieval on YouTube videos, such as online lectures. We believe Speech Retrieval under these conditions can be significantly improved from different approaches. The first one is to apply extra information, such as contexts from conversation. A context includes the other words in the same utterance or conversation. The second approach is to refine the existing IR system, by using better IR search strategy for Speech Retrieval. We analysis existing IR system and present a better search strategy, which is based on the diversity of current approaches. We have investigated how to integrate these two approaches to Speech Retrieval, and determined that the approaches can achieve improvement on Spoken Term Detection (STD) under the limited resources condition. The resulting system has been shown to be effective on multiple languages, implying that the improvement is language independent. Based on our positive result regarding the limited resources condition, we propose to extend existing approaches and develop new techniques for better Speech Retrieval under the open domain condition. We propose a new Speech Retrieval task called Spoken Snippet Retrieval (SSR), which retrieve a moderate size of speech from the speech collection with just enough context. The retrieved snippet is easier for user to listen through compare to the spoken document retrieved by Spoken Document Retrieval (SDR) systems, which has average length of 3 minutes. The snippet is more comprehensible compare to the term location detected by STD systems, since the context are given. The main contribution for the thesis is to complete SSR on the open domain data, which we believe is doing the adequate retrieval on the appropriate data.
منابع مشابه
Open Source Speech and Language Resources for Frisian
In this paper, we present several open source speech and language resources for the under-resourced Frisian language. Frisian is mostly spoken in the province of Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are Frisian-Dutch bilingual and often code-switch in daily conversations. The resources presented in this paper include a code-switching speech da...
متن کاملA Novel Frequency Domain Linearly Constrained Minimum Variance Filter for Speech Enhancement
A reliable speech enhancement method is important for speech applications as a pre-processing step to improve their overall performance. In this paper, we propose a novel frequency domain method for single channel speech enhancement. Conventional frequency domain methods usually neglect the correlation between neighboring time-frequency components of the signals. In the proposed method, we take...
متن کاملLinguistic representation of Finnish in a limited domain speech-to-speech translation system
This paper describes the development of Finnish linguistic resources for use in MedSLT, an Open Source medical domain speech-to-speech translation system. The paper describes the collection of the medical sub-domain corpora for Finnish, the creation of the Finnish generation grammar by adapting the original English grammar, the composition of the domain specific Finnish lexicon and the definiti...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملMissing-feature reconstruction for band-limited speech recognition in spoken document retrieval
In spoken document retrieval, it is necessary to support a variety of audio corpora from sources that have a range of conditions (e.g., channels, microphones, noise conditions, recording media, etc.). Varying band-limited speech represents one of the most challenging factors for robust speech recognition. The missing-feature reconstruction method shows the effectiveness in recognition of the sp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014